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1. Introduction 

5_! ■ Let Ti, T 2 , T 3 be positive integers, TZ = [0,Ti] x [0,T 2 ] x [0,T 3 ] be a rectangular 

region and {Xy*fe|l < i < T%, 1 < j < T 2 , 1 < k < T3} be a family of independent 
and identically distributed integer valued random variables from a specified distri- 
bution. In practice, Xijk can be interpreted as the number of events that occur in 
the elementary subregion njk — [i — l,i]x\j — l,j]x[k — l,k]. Foreachj e {1,2,3}, 
consider the positive integers rrij such that 2 < rrij <Tj — X, and define the random 
variables 



in 
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ii+mi-l 22+m2 — 1 13+7713 — 1 

*u<,= E E E x m> 1 'j- r - "'■ ■ L (i-i) 

i=ii J=i2 fe=i3 

; 

as the number of events occurring in the rectangular region 
7t(ii,i2,h) = [h — +mi — 1] X [12 - 1, «2 + m 2 - 1] x [i 3 - l,i 3 + m 3 - 1]. 

The three dimensional discrete scan statistic is defined as the maximum number of 
events in any rectangle TZ(i±, 12, is) within the region TZ, 
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CO 



> 
in 

max Yi x i 2 i 3 . (1-2) 

h--**. 1 < < Tj — m j + 1 

^ I je{i,2,3} 

' The distribution of scan statistics, 

V(S mi .rn 2 ,m 3 (Ti,T 2 ,T 3 ) < ti) , u G {1, 2, . . . ,m 1 m 2 m 3 } 

is used with success in astronomy (Darling and Waterman [1986]), image analysis 
and reliability theory (Boutsikas and Koutras [2000]) and many other domains. 
For an overview of the potential application of scan statistics one can refer to the 
rS ■ monographs of Glaz, Naus and Wallcnstcin [2001] and more recently the one of 

, Glaz, Pozdnyakov and Wallenstein [2009]. 

From a statistical point of view, the scan statistic S , mi)TO2!m3 (Ti,T2,r 3 ) is used for 
testing the null hypothesis of randomness that X^s are independent and iden- 
tically distributed according to some specified distribution. Under the alternative 
hypothesis there exists one cluster location where the Xijk's have a larger mean than 
outside the cluster. As an example, in the Poisson model, the null hypothesis, Ho, 
assumes that Xijk's are i.i.d. with X^k ~ Pois(X) whereas the alternative hypoth- 
esis of clustering, Hi, assumes the existence of a rectangular subregion 7Z(io,jo, &o) 
such that for any ig < i < iq + mi — 1 , jo < j < jo + fn-2 — 1 and fco < fc < fco + m 3 — 1 , 
X^k are i.i.d. Poisson random variables with parameter A' > A. Outside the region 
^(*0i joj ko), X^k are i.i.d. distributed according to the distribution specified by 
the null hypothesis. The generalized likelihood ratio test rejects Ho in favor of the 
local change alternative Hi, whenever iS mi;lri2im3 (Ti, T 2 , T 3 ) exceeds the threshold 
t determined from P (S mi , m2 , m3 (Ti, T 2 , T 3 ) > t\Hq) — a and where a represents 

l 
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the significance level of the testing procedure (Glaz, Naus and Wallenstein [2001, 
Chapter 13]). 

Since there are no exact formulas available for the distribution of three dimen- 
sional scan statistics, approximation methods are necessary. For the Bernoulli 
model, Glaz, Guerriero and Sen [2010] propose four approximation formulas: one 
Markov like product type approximation and three Poisson type approximations 
that extends the special case described by Darling and Waterman [1986] when 
n = m\m 2 m-s. 

The advantage of the method described in this paper is that it can be used for any 
distribution of the random field and provides accurate approximations and sharp 
error bounds. The methodology used to obtain the approximation and the error 
bounds is presented in Section 2. In Section 3 we describe adapt the importance 
sampling algorithm developed by Naiman and Priebe [2001] to estimate the simu- 
lation error. A simulation study is conducted in Section 4 for considered Bernoulli, 
binomial and Poisson models. Concluding remarks are given in Section 5. 



2. Methodology 



In order to approximate the distribution of S mi ^ m2 . m3 (T\, T 2 , T3) we use a similar 
approach as in Haiman and Preda [2006]. The key idea is to observe that we can 
write the scan statistic random variable as a maximum of 1-dependent stationary 
sequence of random variables. A sequence (Zk)k>i is m-dependent, to > 1, if for any 
h > 1 the cr-fields generated by {Z\, . . . , Zh} and {Zh+m+i, ■ ■ • } are independent. 
The method is based on the following result developed in Haiman [1999, Theorem 
4] and improved in Amarioarei [2012, Theorem 2.6]: 

Let (Zk)k>i be a strictly stationary 1-dependent sequence of random variables and 
for x < sup{w|P(Zi < u) < 1}, let 



q m (x) = P(max(Zi, . . . , Z rn ) < x). 



(2.1) 



Theorem 2.1. For all x such that qi(x) > 1 — a > 0.9, the following approximation 
formula holds: 



2q x - q 2 



[1 + qi - <72 + 2(</i - q 2 ) 2 }' 



< mF(a, m)(l — q\Y 



with 



1 + — 



3_ 

rn 



T(a) 



rn. 



K(a) 



F(a, to) 
where T(a) = L{a) + E(a), 

ll-3a 1 oKl 1 o„ \ 2+3la-a(2-la)(l+la) 

(t^f + M ^ + 6a > — [i- a (i+i a rr — 



(1-91) 



2et(l+;q) 



K{a) 

1 ~ [l-a(l+la) 2 ] 2 

L(a) = 3K(a)(l +a + 3a 2 )[l + a + 3a 2 + K(a)a 3 } + a 6 K 3 (a) 
+ 9a(4 + 3a + 3a 2 ) + 55.1 



E{a) 



■n 5 [1 + (1 - 2a)??] 4 [1 + a(r] - 2)] [l + 7; + (1 - 3q)?? 2 
2(1 - arf Y [(1 - a-q 2 ) 2 - ar] 2 (l + rj - 2ar 1 ) 2 } 



(2.2) 
(2.3) 

(2.4) 

(2.5) 
(2.6) 



and where r/ = 1 + la with I = 1(a) > t\{a) and t 2 (a) the second root in magnitude 
of the equation at 3 —t+1 = 0. 



In this section we obtain an approximation formula for the distribution of scan 
statistic defined by Eq.(1.2) in three steps as follows. 
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Let assume that Lj = n ^ J _ 1 , j 6 {1, 2, 3}, are positive integers and define for each 
k E {1, 2, . . . , L 3 — 1} the random variables 

Z k = max Y ili2 i 3 . (2.7) 

l<ii<(Zi-l)(mi-l) 
Ki 2 <(t 2 -l)(m 2 -l) 
(k-l)(m 3 -l)+l<is<fc(m3-l) 

The set of random variables {Zi, . . . , Zi 3 _i} forms a 1-dependent stationary se- 
quence. Indeed, from Eq.(2.7) and the independence of Xy; we observe that for 
any k > 1, <r(- • • , and a{Z k+ 2, ■ • • ) arc included in <7({Xy;|l < i < Ti, 1 < j < 
T 2 , 1 < I < (fc+l)(m 3 -l)}) and < i < T 1; 1 < j < T 2 , (fc+l)(m 3 -l) + l < 

£}), respectively, which are independent (see Fig. 1). 




Figure 1 . Illustration of Zj~ emphasizing the 1-dependence 



Notice that from Eq.(1.2) and Eq.(2.7) we have 
S(L 

Take for s E {2, 3} 



S(Li, L2, £3) — S mi .m 2 ,m 3 {T\, T2, T 3 ) — max Zk- 

l<k<L3— 1 



Q s = O s (n)=P^f|{Z fe <n}^J 



max ^ii 2 i 3 < n 

l<ii<(ii-l)(mi-l) 
1 l<i2<(i 2 -l)(m 2 -l) 
\ l<i 3 <(s-l)(m 3 -l) / 



(2.8) 



(2.9) 



Notice that in the notation of Eq.(2.1) we have Q s = q s -i- For n such that 
Qzin) > 1 — «i > 0.9 we apply the result in Theorem 2.1 to obtain the first step 
approximation 

2Q 2 - Q3 



P(5(Li,L 2 ,L 3 ) <n) 



[l + 02 -03 + 2(Q 2 -Q 3 ) 2 ] (L3 " 1): 



(2.10) 
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with an error bound of (L 3 — l)F{a\,L 3 — 1)(1 — Q2) 2 ■ Observe that Q2 and Q 3 
represents the distribution of the scan statistics over the rectangular subregions 
[l,Ti] x [1,T 2 ] x [l,2(ms-l)] and [1,T X ] x [1,T 2 ] x [1, 3(ro 3 - 1)], respectively 
(see also Fig 1). To simplify the results of the presentation, in what follows we 
abbreviate the approximation formula by 

2x-y 



H(x,y,m) = 



(2.11) 



[l + x-y + 2(x - y) 2 }" 1 " 1 ' 

In order to evaluate the approximation in Eq.(2.10) it is necessary to find approx- 
imations for Q2 and Q 3 . Thus, the second step consists in applying Theorem 2.1 
for each Q s . We define, as in Eq.(2.7), for s G {2, 3} and I e {1, 2, . . . , L 2 — 1} the 
sequences 

Z\ 8) = max y 4l4243 , (2.12) 

' l<ii<(Li-l)(mi-l) 

((-l)(m 2 -l) + l<J2<K«»2-l) 
l<i 3 <(s-l)(m 3 -l) 

which are strictly stationary, 1-dependent and satisfy 



Q s = F(S(L 1 ,L 2 ,s) <n) = pf 
Set for Me {2,3}, 

Qts = Q to (n) = P (f]{Z^ < n}) = 



max Zf < n 

<1<L 2 -1 



(2.13) 



\i=i 



max 

l<ii<(Li-l)(mi-l) 

1 l<< 2 <(t-l)(TO2-l) 

\ l<i 3 <(s-l)(m 3 -l) 



(2.14) 



If the condition Q2s(n) > 1 — «2s > 0.9 is fulfilled, then using Theorem 2.1, we 
find, for s £ {2,3}, the approximation 

\Q s -H(Q 2s ,Q 3s ,L 2 )\ < (L 2 -l)F(a 2s ,L 2 -l)(l-Q 2s ) 2 . (2.15) 

The last step involves the evaluation of Qts in Eq.(2.15). For s,t € {2,3} and 
j S {1, 2, . . . , L% — 1} let consider the following random sequences 

K,™.. (2.16) 



^ • max j 4142*3 • 

J (j'-l)(mi-l)+l<ii<i(mi-l) 



l<42<(t-l)("i2-l) 
l<t 3 <(s-l)(m 3 -l) 



We observe that ( Z^ ts ^ ) forms 1-dependent stationary sequences and 

V 3 ) •>! 



Q ts =W(S(L 1 ,t,s)<n) = 
Put for r,i, s e {2,3} 



max Z) < 
i<i<ii-i J 



(2.17) 



rt .(n)=P[ ri{^<»}) = 



max ii x i 2 i 3 < n 

l<ii<(r-l)(mi-l) 
l<.*a<(t-l)(ma-l) / 
l<ts<(a-l)(m 3 -l) / 



(2.18) 



Then, under supplementary condition that Q2ts > 1 — c*3ts > 0.9, we apply the 
result in Theorem 2.1 to obtain 

\Qts - H (Q 2ts , Q 3U , Lt)\ < (ii - l)F(a 3ts ,ii - 1)(1 - Q 2ts ) 2 . (2.19) 

Combining the Eqs.(2.10), (2.15) and (2.19) we obtain an approximation formula 
for the distribution of the scan statistic depending on the eight quantities Q r ts, that 
we propose to evaluated by simulation. Note that in the above approximations, at 



each step we consider different values for a. In the next section we show how to 
choose these values. 

Remark 2.2. If T\, T 2 and T 3 are not multiples of mi — 1, m 2 — 1 and to 3 — 

1, respectively, then let consider Lj — 7r ^Li for j G {1,2,3}. Based on the 
inequalities 

F(S(L 1 + l,L 2 + l,L 3 + l)<n)<F<F(S{L 1 ,L 2 ,L 3 )<n), (2.20) 

we can approximate P = P(S l miim2 , m3 (T±, T 2 , T 3 ) < n) by linear interpolation (see 
Table 3). 

2.1. Computing the approximation error. To simplify the presentation and 
the derivation of the approximation formulae, it is convenient to introduce the 
following notations for s, t G {2,3}: 

£*3 = 1 — Q 3 , a 23 = 1 - Q23, «233 = 1 - Q233, 

7 ts = H(Q 2ts ,Q 3ts ,L 1 ), j s = H (72s, 73s, £2), 

F 1 = F{a 3 ,L 3 - 1), F 2 = F{a 23 ,L 2 - 1), F 3 = F{a 233 , L x - 1). 

It is not hard to see that Q 3 < Q 2 , Q 23 < Q 22 and Q233 < Q2ts, so that the 
choice for the thresholds 0:3, a 23 and CK233 becomes natural. Based on the mean 
value theorem in two dimensions, one can easily verify that for m > 6 and yi < X4, 
i G {1,2} we have the inequality: 

\H(xi,yi,m) - H(x 2 ,y 2 ,m)\ < (m - 2) [\xi - x 2 \ + \yi - y 2 \] . (2.21) 

In what follows we use the result from Eq.(2.21) without restrictions. This is in 
agreement with the numerical values considered in Section 4. We begin by observing 
that applying Eq.(2.21) into Eq.(2.10) we obtain 

|P - H (72,73, £ 3 )| <\P-H (Q 2 ,Q 3 ,L 3 )\ + \H (Q 2 , Q 3 , L 3 ) - H ( 72 , 73, L 3 )| 

< (La - l)Fj. (1 - Q 2 f + (L 3 - 2) [\Q 2 - 72 | + |Q 3 - 73 |] , 

(2.22) 

where for simplicity we used the notation P = P (S(Li, L 2 , L 3 ) < n). In the same 
manner, one can see that for s G {2,3} we have 



\Q S - ls\ <\Qs~H (Q 2s ,Q 3s ,L 2 )\ + \H (Q 2 „ Q 3s , L 2 ) - H ( 72s , 73s , L 2 )\ 
< {L 2 - l)F 2 (1 - Q 2s f + (L 2 - 
We notice that Eq.(2.19) can be rewritten as 



< (L 2 - 1)F 2 (1 - Q 2s f + (L 2 - 2) [\Q 2s - 72,| + |Q 3 , - 73s|] ■ (2.23) 



\Qu -lts\<(Li- l)F 3 (l -Q 2ts ), s,tG{2,3}. (2.24) 

Finally, in order to find the approximation error it is sufficient to determine bounds 
for 1 — Q 2 and 1 — Q 2s . It can be easily checked that 

1 - Q 2s < 1 - 72, + \Q2s - 72,1 < S 2s (2.25) 

where 

S 2s = 1 - 72, + [Li - 1)F 3 (1 - Q22,) 2 . (2.26) 
Similarly, we can write 

I-Q2 < I-72 + IQ2-72I <S 2l (2.27) 

with 

5 2 = l-72 + (i2-l)F 2 5 22 + (i 2 -2)( J L 1 -l)F 3 [(1 - Q 222 ) 2 + (1 - Q 232 f] . (2.28) 
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Substituting Eqs.(2.23), (2.24), (2.25) and (2.27) in Eq.(2.22) we derive the formula 
for the approximation error 

: (L 3 - 1)F 1( 5 2 2 + {La - 2)(L 2 - l)F 2 (S 2 22 + 6% 3 ) + 



E„ 



(L 3 -2){L 2 -2){L 1 -l)F 3 



t,s£{2,3} 



(2.29) 



2.2. Computing the simulation errors. Since, from our knowledge, there are 
no exact formulas available for the computation of Q r ts we propose to evaluate them 
by simulation. It is obvious that the simulation error appears from two terms: first, 
from the approximation formula in Eq.(2.22) and second, from the error bound in 
Eq.(2.29). 

Usually, between the true and the estimated value we have a relation of the form 



iris Wrts 



<(3 rtSl r,M£{2,3} 



(2.30) 



where Q r t s are the simulated values corresponding to Qrts- Provided a simulation 
error bound /3 rts as in Eq.(2.30), let denote the simulated values by 

Qts — H(Q 2ts , Q 3 ts, Li), 
Qs = H(Q 2s ,Q 3s ,L 2 ). 

From Eq.(2.21) one obtains 

# (72,73,L 3 )-ff(Q 2 ,Q 3 ,L3)| < (L 3 -2) [| 72 - Q 2 | + | 73 - Q 3 |] . (2.31) 

Observe that the differences in the right hand term in Eq.(2.31) can be bounded 
by 



7s - Q t 



H (7 2s , 73s, L 2 ) - H {Q 2s ,Q 3s ,L 2 



< (L 2 



72s - Qli 



73s - Qit 



(2.32) 



In the same way we can write for t, s € {2, 3} 



Its 



H ( 72 t s ,73i S! Li) - H [Q 2t s,Q3ts,Li 



< (Li 



72ts — Qlts 



+ 



73ts 



^3ts 



< (Li - 2) [f3 2ts + f3 3ts 



(2.33) 



Combining Eqs.(2.33), (2.32) and (2.31) we get the simulation error corresponding 
to the approximation formula 



E 



s/ = (L!-2)(L 2 -2)(L 3 -2) E ^ rts ■ ( 2 ' 34 ) 

\r,t,se{2,3} / 

In order to obtain the simulation error corresponding to the approximation error 
bound in Eq.(2.29) we follow the lines of Section 2.1. With the following notations 

Urts — 1 — <jrts + ftrts, 

U ts = l- q ts + (Li - 2)(/3 2ts + Asts), 
u s = 1 - q s + (L x - 2)(L 2 - 2)(^2s + /3 32s + fes + (3 33s ), 
hs = u 2s + (L 1 - l)F 3 u\ 2s , 

5 2 =u 2 + (L 2 - 1)F 2 S 2 2 + (L 2 - 2)(L a - l)F 3 (u 2 222 + u 2 232 ), 
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the error can be expressed as 

E sapp = (L 3 - l)Fi% + (L 3 - 2)(L 2 - 1)F 2 (8 2 22 + P 23 ) + 

+ (L 3 -2)(L 2 -2)(L 1 -1)F 3 I ^ ul ts \. (2.35) 

\t,se{2,3} / 

The total simulation error is obtained by adding the two terms from Eq.(2.34) and 
Eq.(2.35) 

B s im = F s f + E sa pp. (2.36) 

To evaluate Eq.(2.36), one needs to find suitable values for the bounds (3 r ts- If 
ITER is the number of iterations used in the Monte Carlo simulation algorithm 
for the estimation of Q r ts then, one can consider, for example, the naive bound 
provided by the Central Limit Theorem with a 95% confidence level 



Prts = ^\j Qrt f TE f ts) - (2-37) 

This bound has been used with some success for the two dimensional case (see 
Haiman and Preda [2006]). As the authors pointed out, the main contribution to 
the total error is due to the simulation error, especially for small sizes of the window 
scan with respect to the scanning region. Our numerical study shows that Eq.(2.37) 
is not feasible for the three dimensional case, the simulation error being to large 
with respect to the approximation error. Thus, for the simulation of Qrts, we use 
an importance sampling technique introduced in Naiman and Priebe [2001]. Next 
section illustrates how to adapt theirs algorithm to our problem. 



3. Simulation by importance sampling 

In this section we present a simulation method for Q r t s , which gives an unbiased 
estimate whose variance is typically smaller then that of the naive hit or miss 
Monte Carlo approach. The method is an adaptation of the importance sampling 
algorithm developed in Naiman and Priebe [2001] to our problem. The main idea 
behind is to express the tail of the scan distribution as a Bonferroni upper bound 
(B) with some correction factor (p). Let define for 1 < ij < Nj, j £ {1,2,3} the 
events A ni2ia = {Y ill2l3 > r}. Then 

/T 1 -m 1 + lT 2 -m2 + lT3-m 3 + l \ 

F(S mi>m2 , m3 (T 1 ,T 2 ,T 3 )>T)=Fl IJ U U A Mj 

\ il = l 12 = 1 »3 = 1 / 

Ti-mi + 1 T 2 -m 2 +l T 3 -m 3 + l 

= b Pn»2 l3 / ( i i^2,i3) 

i 1 = l 12 = 1 13 = 1 

= Bp, (3.1) 

where 

Ti-mi + l T 2 -m 2 + l T 3 -m 3 +l 

P= Y Y P^3 I ( i i,i2,is)- (3.2) 

n=i «2=i 23=1 
Under the null hypothesis (Hq), B is the Bonferroni upper bound given by 

Ti-mi+l T 2 -m 2 + l T 3 -m 3 + l 

B = Y Y Y ? (A, i;!i „) 

il = l «2 = 1 *3=1 

= (Ti - mi + 1)(T 2 - m 2 + 1)(T 3 - m 3 + l)P(4i), (3.3) 
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Pi t i 2 i 3 defines an uniform probability distribution over {l,...,Ti — m\ + 1} x 
{l,...,T 2 -m 2 + l} x {l,...,r 3 -m3 + l}, 

ftll2«3 — Tl _ mi+1T2 _ TO2+1T3 _ m3+1 

E E E F ( A ™) 

Si — 1 S2 — 1 S3 — 1 

= (Ti - mi + 1)(T 2 - m 2 + 1)(T 3 - m 3 + 1) ' (3 ' 4) 
/• 1 1 \ 4 . 

and 7(ii,i 2 ,i 3 ) = / 1 2 3 d¥ where C(Y) represents the number of 

J C{Y) P(Ai li2 i 3 ) 

triples («i,i 2 ,« 3 ) such that Yi 1 i 2 i 3 exceeds the threshold r, that is 

Tx-mx + l T 2 -m 2 + l T 3 -m 3 + l 

°( Y ) = E E E !*xw ( 3 - 5 ) 

ii = l i2 = l i3=l 

Based on these identities the simulation algorithm (similar with the one described 
in Naiman and Priebc [2001, page 303]) can be written as follows: 

Begin 

Repeat for each k from 1 to ITER (iterations number) 
Step 1 Generate T € {r, . . . } according to the probabilities 

P(Ym = t) 
Pr(t) = =— -, t > t. 

E p (*m= s ) 

S>T 

Step 2 Conditionally, given T = t, generate the triple (Ji, J 2 , J 3 ) in the 
set {l,...,Ti-mi + l}x{l,...,T 2 -m 2 +l}x{l,...,r 3 -TO 3 + l} 
uniformly. 

Step 3 Conditionally, given T and (Ji, J 2 , J 3 ), generate the set of ran- 
dom variables {Yi^i^Jg < i s < J s + m s — 1,8 € {1,2,3}}, 
uniformly from the set of all the vectors of length mi x m 2 x m 3 
over the set of values taken by Y^^ and whose sum is equal 
with T. Take the remaining Y^^ to be i.i.d. and distributed 
according to the null hypothesis distribution. 

Step 4 Take Ck — C(Yf.) the number of all triples («i,i 2 ,i 3 ) such that 
Y ili2 i 3 > T and put p k = i. 
End Repeat 



ITER 

ITER 

k=l 

End 



Return p = — — — — 



Clearly, p is an unbiased estimator for p with estimated variance 



I ITER / ITE1 

m-i ^ ( ^ ~ ITER ^ 

k=l \ k=l 



ITER \ 2 



Var(p) « YfER~^ > * I '"■ " n r r7> > , 1 • ,:5 -'» 

For ITER sufficiently large, as a consequence of CLT the error between the true 
and the estimated value of the tail IP (S mijjn2im3 (Ti, T 2 , T 3 ) > r), corresponding to 
a 95% confidence level, is given by 



= 1 W^g. (3.7) 
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Notice that for the simulation of Q r t s , we substitute T\, Ti and T3 in the above 
relations with r(m\ — 1), t(m,2 — 1) and 5(7713 — 1) respectively. Therefore, we obtain 
the corresponding values for (3 rts as described by Eq.(3.7). 

4. Numerical values for Binomial and Poisson models 

In this section, for selected values of the parameters of the binomial and Poisson 
distributions, we evaluate the approximation introduced in Section 2 and provide 
the corresponding error bounds. We show the contributions of the approximation 
(Eq.(2.29)) and simulation (Eq.(2.36)) errors in the overall error. 
For all our simulations we used the importance sampling algorithm with ITER = 
10 replications. We compare our results with those existing in literature, see 
Glaz, Guerriero and Sen [2010] for Bernoulli model, and with the simulated value 
of the scan statistics obtained by scanning the whole region 1Z, denoted by P(S < n). 
The scanning of TZ being more time consuming than the scanning of the subregions 
corresponding to Q rs t, we used 10 3 repetitions of the algorithm. 
In Table 1, we compare the results obtained by our approximation with the product 
type approximation presented by Glaz, Guerriero and Sen [2010]. We observe that 
our approximation is very sharp. 





Table 1. 


Approximation 


for ¥(S < n) in Bernoulli case: 


mi — rri2 


= OT3 = 




5, Ti = T 2 = 


T 3 = 60, ITER 


= 10 5 








n 


P(S < n) 


Glaz et al. 


Our 


&app 


R ■ 


Total 






Product type 


Approximation 


Eq.(2.29) 


Eq.(2.36) 


Error 








p = 0.00005 








1 


0.841806 


0.841424 


0.851076 


0.011849 


0.064889 


0.076738 


2 


0.999119 


0.999142 


0.999192 


0.000000 


0.000170 


0.000170 


3 


0.999997 


0.999998 


0.999997 


0.000000 


3 x 10~ 7 


3 x 10- 7 








p = 0.0001 








2 


0.993294 


0.993241 


0.993192 


0.000010 


0.001367 


0.001377 


3 


0.999963 


0.999964 


0.999963 


0.000000 


0.000005 


0.000005 


4 


0.999999 


0.999999 


0.999999 


0.000000 


2 x 10~ 9 


2 x 10~ 9 



Table 2 presents the numerical results obtained by scanning the region TZ of size 
60 x 60 x 60 with two windows of the same volume but different sizes, first a cubic 
window of size 4x4x4 and second a rectangular region of size 8x4x2. We 
observe that the results are closely related, but significantly different. 
In Table 3 we have included numerical values emphasizing the situation described 
by Remark 2.2. We consider the Bernoulli model of parameter p = 0.0001 over 
the region 1Z of size 185 x 185 x 185 and scan it with a cubic window of length 
10. The second and forth columns gives the values corresponding to the bounds 
described in Eq.(2.20), while in the third column we presented the simulated values 
for P (5i ,io,io(185, 185, 185) < n). 

In order to compare the binomial and Poisson models, in Table 4, we have evaluated 
the distribution of the scan statistics over a region of size 84 x 84 x 84 scanned with 
a 4 x 4 x 4 cubic window, in the two situations. In the first case we have a binomial 
random field with parameters m and p, that is X^k ~ B(m,p), while in the second 
we considered that Xijk ~ f (A), with A = rap. 

Notice that the contribution of the approximation error (E app ) to the total error is 
almost negligible in most of the cases with respect to the simulation error (E S i m ). 
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TABLE 2. Approximation for P(S < n) over the region 1Z with windows of the 
same volume by different sizes: T t = T 2 = T 3 = 60, p = 0.0025, ITER = 10 5 



n P(S < n) 



Our 
Approximation 



Eq.(2.29) 



Eq.(2.36) 



mi = m2 



: m 3 



Total 
Error 



0.961691 
0.999006 
0.999980 
0.999999 



0.963506 
0.999023 
0.999980 
0.999999 



0.000038 
0.000000 
0.000000 
0.000000 



0.003622 
0.000071 
0.000001 

2 x 10" 9 



0.003660 
0.000071 
0.000001 

2 x 10" 



-1-9 



8, mi = 4, m3 



5 0.969189 0.969110 0.000007 0.003387 0.003395 

6 0.999297 0.999228 0.000000 0.000071 0.000071 

7 0.999984 0.999984 0.000000 0.000001 0.000001 

8 0.999999 0.999999 0.000000 2 x lO" 9 2 x 10 



TABLE 3. Approximation for P(S < n) based on Eq.(2.20): mi — m.2 
10, Ti = T 2 = T 3 = 185, Li = L 2 = L 3 = 20, ITER = 10 5 



n 


P(5(Li + 


1,L 2 + 1,L 3 + 1) < n) 


P(5 < n) 


P(S(Li,L 2 ,X 3 ) < n) 


4 




0.97524633 


0.97465263 


0.97491935 






(±0.00754004) 


(±0.00618987) 


(±0.00643099) 


5 




0.99931055 


0.99935163 


0.99938629 






(±0.00015833) 


(±0.00014759) 


(±0.00013490) 


6 




0.99998641 


0.99998632 


0.99998784 






(±0.00000272) 


(±0.00000326) 


(±0.00000230) 


TABLE 4. Approximation for ¥(S < n) in Binomial and Poisson cases: mi — 


m,2 


= m 3 = 4, Tt = T 2 = T 3 = 84, ITER 


= 10 5 






n 


P(S < n) 


Our 


Eapp 




Total 






Approximation 


Eq.(2.29) 


Eq.(2.36) 


Error 






Binomial : m = 


10, p = 0.0025 






10 


0.726386 


0.723224 


0.007763 


0.032197 


0.039960 


11 


0.954605 


0.955417 


0.000123 


0.003079 


0.003202 


12 


0.993938 


0.993906 


0.000001 


0.000331 


0.000333 


13 


0.999289 


0.999284 


0.000000 


0.000033 


0.000033 


14 


0.999923 


0.999921 


0.000000 


0.000003 


0.000003 


15 


0.999992 


0.999992 


0.000000 


3 x 10~ 7 


3 x 10~ 7 






Poisson : 


A = 0.025 






10 


0.713184 


0.708481 


0.009211 


0.035294 


0.044506 


11 


0.950947 


0.950197 


0.000143 


0.003345 


0.003488 


12 


0.993624 


0.993452 


0.000002 


0.000365 


0.000367 


13 


0.999218 


0.999210 


0.000000 


0.000038 


0.000038 


14 


0.999912 


0.999911 


0.000000 


0.000003 


0.000003 


15 


0.999990 


0.999990 


0.000000 


3 x 10~ 7 


3 x 10" 7 



Thus, the precision of the method will depend mostly on the number of iterations 
(ITER) used to estimate Q r ts- 
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The time required for the computations presented in this section was about two 
hours for each table on a computer of medium performances. The programs are 
written in MATLAB and are available from the authors. 

5. Conclusions 

In this article we derived an approximation for the three dimensional discrete scan 
statistic viewed as the maximum of a 1-dependent stationary sequence of ran- 
dom variables. We also provide the corresponding theoretical and simulation error 
bounds. In the three dimensional scan statistics framework, it is essential to reduce 
the variance of simulated values. For this purpose we used an importance sam- 
pling method. A simulation study for the binomial and Poisson models shows the 
accuracy as well as the limit of our method. 
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